OHI-Northeast | OHI Science | Citation policy
This script calculates catch by OHI region using landings data provided by NOAA.
Downloaded: July, 2019
Description: Commercial fish landings by statistical area
Time range: 1996-2017. Data provided annually
Format: Excel spreadsheet
Cleaning the raw data a bit by fixing column names and turning stat_area numeric.
raw <- read_excel(file.path(dir_anx, "_raw_data/NOAA_NMFS/catch_by_stat_area/Afflerbach_UCSB_Landings by Stat Area_JUL 2019 Updated.xlsx"))
clean <- raw %>%
rename(year = YEAR,
stat_area = `STAT\r\nAREA`,
species = SPECIES,
pounds = `LBS LANDED \r\n(HAIL WT)`,
stock_id = `STOCK ID`,
stock = `STOCK NAME`) %>%
mutate(stat_area = as.numeric(stat_area))
head(clean)## # A tibble: 6 x 6
## year stat_area species pounds stock_id stock
## <dbl> <dbl> <chr> <dbl> <chr> <chr>
## 1 1996 0 CONFIDENTIAL SPECIES 9242135 <NA> <NA>
## 2 1996 462 COD 11827 CODGMSS GOM Cod
## 3 1996 462 CUSK 1065 <NA> <NA>
## 4 1996 462 FLOUNDER, AMERICAN PLAICE … 21191 PLAGMMA Plaice
## 5 1996 462 FLOUNDER, WITCH / GRAY SOLE 5496 WITGMMA Witch Floun…
## 6 1996 462 HADDOCK 296 HADGM GOM Haddock
There are a total of 154 reported species, which include groupings such as “Confidential Species” or “Clam, species not specified”. Some of these species, such as Atlantic Cod, have multiple stocks.
Since the data is provided by statistical landing area, we can use this information to infer what OHI region’s encompass or overlap with these areas. We have downloaded the shapefile for Statistical Areas from this public FTP NOAA site.
Load in the statistical areas and add area of each polygon as a column.
stat_shp <- sf::read_sf(file.path(dir_anx, "spatial/Statistical_Areas_2010_withNames.shp")) %>%
st_set_crs(p4s_nad83) %>% #set the CRS
st_transform(crs = crs(rgns)) #transform to our NE specific CRS
stat_shp$stat_area <- st_area(stat_shp) #add area as column
ggplot(stat_shp) +
geom_sf() +
theme_bw() +
labs(title = "Statistical areas") +
theme(legend.position = "none")We overlay statistical areas with our regions to select just the statistical areas that overlap with our regions.
ohi_stat_areas <- st_intersection(rgns, stat_shp) #intersects statistical areas with OHI regions
ohi_stat_areas$ohi_area <- st_area(ohi_stat_areas) #calculate area of each overlapped polygon
ggplot(ohi_stat_areas) +
geom_sf(aes(fill = rgn_name)) +
geom_sf(data = ne_states, fill = "beige") +
theme_bw() +
labs(title = "Statistical areas overlapped with OHI regions") Calculate proportion of each statistical area in our OHI regions. For statistical areas that overlap with OHI regions, we can use proportional area overlap to adjust catch. We assume that catch is evenly distributed across each statistical area.
calc_prop_area <- ohi_stat_areas %>%
group_by(Id) %>%
mutate(ohi_rgn_prop_area = ohi_area/stat_area) #this column tells us how much of each OHI sub-region falls within the statistical area in our region
ggplot(calc_prop_area) +
geom_sf(aes(fill = ohi_rgn_prop_area)) +
geom_sf(data = ne_states, fill = "beige") +
theme_bw() +
labs(title = "Proportion of each \nstatistical area in OHI region") Now we calculate the total catch per species and year for each of the OHI regions.
First let’s filter the catch data to just the statistical areas in our region. We don’t care about the catch outside of these statistical areas.
region_catch <- clean %>%
filter(stat_area %in% ohi_stat_areas$Id) %>%
left_join(calc_prop_area, by = c("stat_area" = "Id")) %>%
mutate(catch = pounds*ohi_rgn_prop_area) %>% #adjusting catch by the proportional area with overlap
select(-area_km2, -FULL_NAME, -SHORT_NAME, -stat_area.y, -ohi_area, -NAFODIV) %>%
group_by(species, stock_id, stock, rgn_id, year, rgn_name) %>%
summarize(catch = sum(catch)) %>%
ungroup() %>%
mutate(display_name = ifelse(is.na(stock_id), species, stock_id))
head(region_catch, n = 20)## # A tibble: 20 x 8
## species stock_id stock rgn_id year rgn_name catch display_name
## <chr> <chr> <chr> <dbl> <dbl> <fct> <dbl> <chr>
## 1 ALEWIFE <NA> <NA> 3 1998 Gulf of Maine 2627. ALEWIFE
## 2 ALEWIFE <NA> <NA> 3 1999 Gulf of Maine 43.4 ALEWIFE
## 3 ALEWIFE <NA> <NA> 3 2000 Gulf of Maine 241. ALEWIFE
## 4 ALEWIFE <NA> <NA> 3 2003 Gulf of Maine 46.6 ALEWIFE
## 5 ALEWIFE <NA> <NA> 3 2006 Gulf of Maine 0 ALEWIFE
## 6 ALEWIFE <NA> <NA> 3 2007 Gulf of Maine 0 ALEWIFE
## 7 ALEWIFE <NA> <NA> 4 2012 Mid-Atlantic B… 562. ALEWIFE
## 8 ALEWIFE <NA> <NA> 4 2013 Mid-Atlantic B… 5806. ALEWIFE
## 9 ALEWIFE <NA> <NA> 4 2014 Mid-Atlantic B… 0 ALEWIFE
## 10 ALEWIFE <NA> <NA> 4 2015 Mid-Atlantic B… 0 ALEWIFE
## 11 ALEWIFE <NA> <NA> 4 2016 Mid-Atlantic B… 0 ALEWIFE
## 12 ALEWIFE <NA> <NA> 4 2017 Mid-Atlantic B… 0 ALEWIFE
## 13 ALEWIFE <NA> <NA> 5 2013 Connecticut 0 ALEWIFE
## 14 ALEWIFE <NA> <NA> 5 2014 Connecticut 0 ALEWIFE
## 15 ALEWIFE <NA> <NA> 5 2016 Connecticut 0 ALEWIFE
## 16 ALEWIFE <NA> <NA> 5 2017 Connecticut 0 ALEWIFE
## 17 ALEWIFE <NA> <NA> 6 1998 Maine 563. ALEWIFE
## 18 ALEWIFE <NA> <NA> 6 1999 Maine 9.31 ALEWIFE
## 19 ALEWIFE <NA> <NA> 6 2000 Maine 51.7 ALEWIFE
## 20 ALEWIFE <NA> <NA> 6 2003 Maine 10.00 ALEWIFE
total_catch <- region_catch %>%
group_by(year) %>%
summarize(catch = sum(catch))
ggplot(total_catch, aes(x = year, y = catch)) +
geom_area() +
labs(x = "",
y = "Pounds",
title= "Total catch in the Northeast") +
theme_bw()The data shared with us includes records of 0 catch. But there is still missing data. As an example, let’s look at ALEWIFE.
## [1] 1998 1999 2000 2003 2006 2007 2012 2013 2014 2015 2016 2017
Ok clearly we are missing data for 2001, 2002, 04-05, 2008-11. We don’t know if these are 0’s or missing data. We need to gapfill this missing data. When a species/state combination has missing data for a year, we can not assume it has a catch of 0. Since we calculate a rolling average of catch, NAs will remain as NA’s and the average will rely on just one or two years of catch. This is done to account for any wild fluctuations in catch year to year.
gf_data <- region_catch %>%
group_by(rgn_id, rgn_name, species, stock, stock_id, display_name) %>%
complete(year = 1998:2017) %>%
arrange(year) %>%
mutate(mean_catch = zoo::rollapply(catch, 3, mean, fill = NA, align = 'right')) %>% ## create a new column `mean_catch` with rolling mean of 3 yrs
filter(year > 2004) %>%
select(year, rgn_id, rgn_name, species, stock, stock_id, mean_catch, display_name) %>%
ungroup()
p <- ggplot(gf_data, aes(x = year, y = mean_catch, fill = display_name)) +
geom_bar(stat = "identity") +
facet_wrap(~rgn_name, scales = "free_y") +
theme_bw() +
theme(legend.text = element_text(size = 6),
legend.position = "below",
axis.text = element_text(size = 6),
axis.text.x = element_text(angle = 45))
plotly::ggplotly(p)Let’s look at total regional catch for each species (not stock)
#calculate total regional catch per species
species_catch <- gf_data %>%
group_by(species, year) %>%
summarize(sp_catch = sum(mean_catch, na.rm=T)) %>%
ungroup() %>%
group_by(year) %>%
mutate(yr_catch = sum(sp_catch),
catch_prop = sp_catch/yr_catch) %>%
ungroup() %>%
filter(year > 2004) p <- ggplot(species_catch, aes(x = year, y = catch_prop, fill = species)) +
geom_bar(stat = "identity") +
theme_bw() +
theme(legend.text = element_text(size = 6))
plotly::ggplotly(p)Clearly atlantic herring is making up the majority of catch! Atlantic herring is primarily a bait fishery, so we need to account for that since this goal is only measuring catch meant for human consumption. We adjust for this below.
Some of these species are harvested for food as well as other markets like pet food or bait. We want to make sure this goal captures catch meant for human consumption. We have data from NOAA that identifies the amount of catch per species, state and year meant for food, bait, and other markets. This data was cleaned in prop_catch_food_bait.Rmd.
prop_data <- read_csv("data/fish_catch_food_prop_rgn.csv")
ggplot(prop_data %>% filter(market == "BAIT"), aes(x = year, y = pounds_live_by_market, color = species)) +
geom_line() +
theme_bw() +
theme(legend.title = element_blank()) +
labs(x = "",
y = "Catch (pounds)",
title = "Total amount of species catch sold as bait \n(not for direct human consumption)")toolbox_data <- gf_data %>%
left_join(prop_data) %>%
mutate(prop = ifelse(is.na(prop),1,prop),
mean_catch_times_prop = mean_catch*prop) %>%
filter(!market %in% c("BAIT", "NO MARKET", "CANNED PET FOOD"), #remove bait, pet food and no market records.
!is.na(mean_catch)) %>% #remove records with no catch (don't need them)
select(-market, -pounds_live_by_market, -total_pounds_live, -prop)tot <- toolbox_data %>%
group_by(year, rgn_name) %>%
summarize(catch = sum(mean_catch_times_prop))
ggplot(tot, aes(x = year, y = catch, fill = rgn_name)) +
geom_area() +
scale_fill_viridis_d() +
theme_bw() +
theme(legend.title = element_blank()) +
labs(x = "",
y = "Pounds",
title = "Total catch in pounds meant for human consumption, by OHI region")The raw data from NMFS comes with species names displayed like “SCALLOP, SEA” instead of Sea Scallop. This little fix changes the species names to a more user-friendly version.